Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 rework regions dataset to have unique indexes #1282

Merged
merged 12 commits into from
Aug 2, 2023

Conversation

larsyencken
Copy link
Collaborator

The regions dataset contains many join tables with one-to-many relationships. In our current strict dimensions model, we have two options:

  • Give them an integer index, or
  • Put every column in the index

This change does the latter. It's also meant to trigger discussion about what we want in general here.

The regions dataset contains many join tables with one-to-many
relationships. In our current strict dimensions model, we have two options:

- Give them an integer index, or
- Put every column in the index

This change does the latter. It's also meant to trigger discussion about
what we want in general here.
@pabloarosado
Copy link
Contributor

I think that having a table like aliases, with both code and alias as indexes is not very useful.
More generally, I think so far there has been no benefit of having many separate tables. Should we instead consider keeping only the regions table (which has a well-defined index)?

Base automatically changed from strict-mode-unique-index to master June 29, 2023 10:14
@pabloarosado
Copy link
Contributor

pabloarosado commented Jun 29, 2023

Hey @larsyencken I've removed all tables except for the regions table in the garden step, and adapted the grapher step accordingly.
There are still a few things that need to be refactored:

  • etl.harmonize
  • etl.data_helpers.geo
  • owid_co2 garden step
  • etl.fasttrack

@larsyencken
Copy link
Collaborator Author

Thanks Pablo!

@pabloarosado pabloarosado requested review from Marigold and removed request for pabloarosado July 28, 2023 09:53
@pabloarosado
Copy link
Contributor

Hi @larsyencken I think this PR is ready to be merged. I've removed all tables from the regions dataset except the combined regions table (which has a unique index and all data we need about regions), and I've removed all uses of those tables. I'll wait for @Marigold to check it out before merging (hopefully it doesn't clash with any of the other metadata refactors).

@larsyencken
Copy link
Collaborator Author

Good work!!!

Copy link
Collaborator

@Marigold Marigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it looks good. However, changing regions triggers a huge ETL rebuild, so please merge it at the end of your workday to avoid disrupting others' work.

@pabloarosado pabloarosado merged commit 76e6f42 into master Aug 2, 2023
@pabloarosado pabloarosado deleted the regions-unique-index branch August 2, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants